13 research outputs found

    Processor allocator for chip multiprocessors

    Full text link
    Chip MultiProcessor (CMP) architectures consisting of many cores connected through Network-on-Chip (NoC) are becoming main computing platforms for research and computer centers, and in the future for commercial solutions. In order to effectively use CMPs, operating system is an important factor and it should support a multiuser environment in which many parallel jobs are executed simultaneously. It is done by the processor management system of the operating system, which consists of two components: Job Scheduler (JS) and Processor Allocator (PA). The JS is responsible for job scheduling that deals with selection of the next job to be executed, while the task of the PA is processor allocation that selects a set of processors for the job selected by the JS. In this thesis, the PA architecture for the NoC-based CMP is explored. The idea of the PA hardware implementation and its integration on one die together with processing elements of CMP is presented. Such an approach requires the PA to be fast as well as area and energy efficient, because it is only a small component of the CMP. The architecture of hardware version of a PA is presented. The main factor of the structure is a type of processor allocation algorithm, employed inside. Thus, all important allocation techniques are intensively investigated and new schemes are proposed. All of them are compared using experimentation system. The PA driven by the described allocation techniques is synthesized on FPGA and crucial energy and area consumption together with performance parameters are extracted. The proposed CMP uses NoC as interconnection architecture. Therefore, all main NoC structures are studied and tested. Most important parameters such as topology, flow control and routing algorithms are presented and discussed. For the proposed NoC structures, an energy model is proposed and described. Finally, the synthesized PAs and NoCs are evaluated in a simulation system, where NoC-based CMP is created. The experimental environment took into consideration energy and traffic balance characteristics. As a result, the most efficient PA and NoC for CMP are presented

    Software Development Approach for Discrete Simulators

    Full text link
    Simulation is the most common approach to perform the problem research. Among several types of simulation, the most common way is the discrete simulation, which assumes the division of the time scale into fixed length time slots. Depending on investigated problem, simulation packages may be used or it could be necessary to design and create own simulation system. In this paper, we propose the complete pre-study scheme and the most commonly appearing implementation problems with suggested solutions. We also describe how to implement the exemplary simulator in C++

    Processor Allocation Problem for NoC-based Chip Multiprocessors

    Full text link
    Chip multiprocessors (CMPs) have become the primary approach to build high-performance microprocessors. Such systems require fast and efficient communication that can only be realized using network on chip (NoC), particularly for large systems. Allocation and management of on-chip processors are also important factors to achieve high efficiency. Designing processor allocator, job scheduler and NoC are major issues for future CMPs. In this paper we analyze architectures of NoC for CMPs. Such NoC parameters as topology, flow control and routing are studied and proposed for CMPs implementation. Modern processor allocation algorithms together with scheduling techniques are reviewed and suggested. Hardware structure of NoC-based CMPs is introduced for the recommended solutions. We propose hardware implementation of processor allocator and job scheduler, and place them together with on-chip processors on the same die

    Fast and Efficient Processor Allocation Algorithm for Torus-based Chip Multiprocessors

    Full text link
    Processor Allocator (PA) is a crucial factor in modern Chip MultiProcessors (CMPs). A modern CMP uses Network on Chip (NoC) as communication technique between cores. Thus, the topology of the implemented NoC has also significant impact on the CMP’s performance. A good processor allocation technique needs to be fast and ensure the highest possible system utilization. In this paper, we propose a processor allocation technique for such an efficient and fast PA. The PA is driven by a Bit Map Allocation for Torus (BMAT) algorithm, which is a technique designed for k-ary 2-cube topology. The proposed BMAT scheme is presented and described along with a new Busy List Allocation for Torus (BLAT), Sorting Allocation for Torus (SAT) and Stack Based Allocation for Torus (SBAT) algorithms. The presented techniques are compared with previously known important schemes for k-ary 2-mesh topology. The research ideas have been verified using experiments that have also been described in the paper. The presented simulation results reveal that the proposed processor allocation algorithm for k-ary 2-cube, as a technique for PA, achieves better allocation time than all other existing algorithms while the CMP with such a PA is characterized by very high system utilization

    Hardware Implementation of Processor Allocation Schemes for Mesh-based Chip Multiprocessors

    Full text link
    Well-designed Processor Allocator (PA) is an important factor in modern Chip MultiProcessors (CMPs). It needs to be fast as well as area and energy efficient, because it is only a small component of the CMP. In this paper, we propose an architecture for such an efficient and fast PA. The PA structure is based on bit map approach and is driven by an Improved First Fit (IFF) algorithm, which is presented and described. Together with the proposed IFF technique, a new Improved Adaptive Scan (IAS) and an Improved Quick Allocation (IQA) algorithms are introduced and discussed and compared with previously known important techniques. The presented synthesis results reveal that the proposed PA achieves good frequency results while, at the same time is characterized by low logic utilization

    Matrix Multiplication in Multiphysics Systems Using CUDA

    Full text link
    Multiphysics systems are used to simulate various physics phenomena given byPartial Differential Equations (PDEs). The most popular method of solving PDEs is Finite Element method. The simulations require large amount of computational power, that is mostly caused by extensive processing of matrices. The high computational requirements have led recently to parallelization of algorithms and to utilization of Graphic Processing Units (GPUs). To take advantage of GPUs, one of GPU programming models has to be used. In this paper, CUDA model developed by nVidia is used to implement two parallel matrix multiplication algorithms. To evaluate the effectiveness of these algorithms, several experiments have been performed. Results have been compared with results obtained by classic Central Processing Unit (CPU) matrix multiplication algorithm. The comparison shows that matrix multiplication on GPU significantly outperforms classic CPU approach

    Modeling Computational Limitations in H-Phy and Overlay-NoC Architectures

    Full text link
    High performance computing demands constant growth in computational power and services that can be offered by modern supercomputers. It requires technological and designing advances in the multiprocessor internal structures as well as novel computing models considering the very high computing demands. One of the increasingly important requirements of computing platforms is a functionality that allows efficient managing computational resources, i.e., monitor them, restrict an access to some part of the resources, account for computational service, or ensure reliability and quality of service when some resources are broken or disabled. In this paper, we present a new model describing computational limitations for processing tasks on multiprocessor systems. The model is implemented in Hardware-Physical (H-Phy) and Overlay-Network-on-Chip (Overlay-NoC) architectures. Both architectures and the model are described and analyzed. Experimentation system is also presented, together with simulation assumptions, results of research and their study. The paper provides complete models of H-Phy and Overlay-NoC structures with an ability to restrict processing resources

    NA

    Get PDF
    This thesis addresses the recruitment of African-American high school students for Naval Reserve Officers Training Corps (NROTC) programs, with an emphasis on programs located at Historically Black Colleges and Universities. The study seeks to determine if the current recruitment process is adequate to meet the needs of the Secretary of the Navy's "Enhanced Opportunities for Minorities Initiative." This initiative is a recruiting strategy designed to increase the number of minorities on active duty and to create a culturally diverse force that reflects the racial composition of the United States. This thesis draws upon information from Pers-61, Navy Recruiting Command, the Center for Navy Education and Training, and NROTC units, as well a survey conducted with current and former recruiters. Also examined are statements by the Chief of Naval Operations, Chief of Naval Personnel, Commander of Naval Recruiting Command and numerous other Flag Officers in briefings held at the 1998 National Naval Officers Association Conference. A major theme that emerged from the study is that the Navy should enhance its visibility and use more African-Americans in minority recruiting programs for the officer corps.http://archive.org/details/therecruitmentof1094532771NAU.S. Navy (U.S.N.) author

    Reduction of Knowledge Representation Using Logic Minimization Techniques

    Full text link
    This paper is dedicated to two seemingly different problems. The first one concerns information theory and the second one is connected to logic synthesis methods. The reason why these issues are considered together is the important task of the efficient representation of data in information systems and as well as in logic systems. An efficient algorithm to solve the task of attributes/arguments reduction is presented
    corecore